An Efficient Sequential Covering Algorithm for Explaining Subsets of Data

نویسندگان

  • Matthew Michelson
  • Sofus A. Macskassy
چکیده

Given a subset of data that differs from the rest, a user often wants an explanation as to why this is the case. For instance, in a database of flights, a user may want to understand why certain flights were very late. This paper presents ESCAPE, a sequential covering algorithm designed to generate explanations of subsets that take the form of disjunctive normal rules describing the characteristics ({attribute, value} pairs) that differentiates the subsets from the rest of the data. Our experiments demonstrate that ESCAPE discovers explanations that are both compact, in that just a few rules cover the subset, and specific, in that the rules cover the subset but not the rest of the data. Our experiments compare ESCAPE to RIPPER, a popular, traditional rule learning algorithm and show that ESCAPE’s rules yield better covering explanations. Further, ESCAPE was designed to be efficient, and we formally demonstrate that ESCAPE runs in loglinear time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A TRUST-REGION SEQUENTIAL QUADRATIC PROGRAMMING WITH NEW SIMPLE FILTER AS AN EFFICIENT AND ROBUST FIRST-ORDER RELIABILITY METHOD

The real-world applications addressing the nonlinear functions of multiple variables could be implicitly assessed through structural reliability analysis. This study establishes an efficient algorithm for resolving highly nonlinear structural reliability problems. To this end, first a numerical nonlinear optimization algorithm with a new simple filter is defined to locate and estimate the most ...

متن کامل

Well-dispersed subsets of non-dominated solutions for MOMILP ‎problem

This paper uses the weighted L$_1-$norm to propose an algorithm for finding a well-dispersed subset of non-dominated solutions of multiple objective mixed integer linear programming problem. When all variables are integer it finds the whole set of efficient solutions. In each iteration of the proposed method only a mixed integer linear programming problem is solved and its optimal solutions gen...

متن کامل

A Reliable Multi-objective p-hub Covering Location Problem Considering of Hubs Capabilities

In the facility location problem usually reducing total transferring cost and time are common objectives. Designing of a network with hub facilities can improve network efficiency. In this study a new model is presented for P-hub covering location problem. In the p-hub covering problem it is attempted to locate hubs and allocate customers to established hubs while allocated nodes to hubs are in...

متن کامل

An L1-norm method for generating all of efficient solutions of multi-objective integer linear programming problem

This paper extends the proposed method by Jahanshahloo et al. (2004) (a method for generating all the efficient solutions of a 0–1 multi-objective linear programming problem, Asia-Pacific Journal of Operational Research). This paper considers the recession direction for a multi-objective integer linear programming (MOILP) problem and presents necessary and sufficient conditions to have unbounde...

متن کامل

Efficient Solution Procedure to Develop Maximal Covering Location Problem Under Uncertainty (Using GA and Simulation)

  In this paper, we present the stochastic version of Maximal Covering Location Problem which optimizes both location and allocation decisions, concurrently. It’s assumed that traveling time between customers and distribution centers (DCs) is uncertain and described by normal distribution function and if this time is less than coverage time, the customer can be allocated to DC. In classical mod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010